SUMMARY

•Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends.
•Effective and scalable methods have been developed for decision trees induction, Naive Bayesian classification, Bayesian belief network, rule-based classifier, Backpropagation, Support Vector Machine (SVM), associative classification, nearest neighbor classifiers, and case-based reasoning, and other classification methods such as genetic algorithms, rough set and fuzzy set approaches.
•Linear, nonlinear, and generalized linear models of regression can be used for prediction.  Many nonlinear problems can be converted to linear problems by performing transformations on the predictor variables.  Regression trees and model trees are also used for prediction. 
•Stratified k-fold cross-validation is a recommended method for accuracy estimation.  Bagging and boosting can be used to increase overall accuracy by learning and combining a series of individual models.
•Significance tests and ROC curves are useful for model selection
•There have been numerous comparisons of the different classification and prediction methods, and the matter remains a research topic
•No single method has been found to be superior over all others for all data sets
•Issues such as accuracy, training time, robustness, interpretability, and scalability must be considered and can involve trade-offs, further complicating the quest for an overall superior method

 

Content on this page requires a newer version of Adobe Flash Player.

Get Adobe Flash player

classification and prediction by v. vanthana